170 research outputs found
As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning
Omission and addition of content is a typical issue in neural machine
translation. We propose a method for detecting such phenomena with
off-the-shelf translation models. Using contrastive conditioning, we compare
the likelihood of a full sequence under a translation model to the likelihood
of its parts, given the corresponding source or target sequence. This allows to
pinpoint superfluous words in the translation and untranslated words in the
source even in the absence of a reference translation. The accuracy of our
method is comparable to a supervised method that requires a custom quality
estimation model.Comment: ACL 202
Improving Zero-Shot Cross-lingual Transfer Between Closely Related Languages by Injecting Character-Level Noise
Cross-lingual transfer between a high-resource language and its dialects or closely related language varieties should be facilitated by their similarity. However, current approaches that operate in the embedding space do not take surface similarity into account. This work presents a simple yet effective strategy to improve cross-lingual transfer between closely related varieties. We propose to augment the data of the high-resource source language with character-level noise to make the model more robust towards spelling variations. Our strategy shows consistent improvements over several languages and tasks: Zero-shot transfer of POS tagging and topic identification between language varieties from the Finnic, West and North Germanic, and Western Romance language branches. Our work provides evidence for the usefulness of simple surface-level noise in improving transfer between language varieties
Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by Injecting Character-level Noise
Cross-lingual transfer between a high-resource language and its dialects or
closely related language varieties should be facilitated by their similarity.
However, current approaches that operate in the embedding space do not take
surface similarity into account. This work presents a simple yet effective
strategy to imrove cross-lingual transfer between closely related varieties. We
propose to augment the data of the high-resource source language with
character-level noise to make the model more robust towards spelling
variations. Our strategy shows consistent improvements over several languages
and tasks: Zero-shot transfer of POS tagging and topic identification between
language varieties from the Finnic, West and North Germanic, and Western
Romance language branches. Our work provides evidence for the usefulness of
simple surface-level noise in improving transfer between language varieties.Comment: ACL 202
X-stance: A Multilingual Multi-Target Dataset for Stance Detection
We extract a large-scale stance detection dataset from comments written by
candidates of elections in Switzerland. The dataset consists of German, French
and Italian text, allowing for a cross-lingual evaluation of stance detection.
It contains 67 000 comments on more than 150 political issues (targets). Unlike
stance detection models that have specific target issues, we use the dataset to
train a single model on all the issues. To make learning across targets
possible, we prepend to each instance a natural question that represents the
target (e.g. "Do you support X?"). Baseline results from multilingual BERT show
that zero-shot cross-lingual and cross-target transfer of stance detection is
moderately successful with this approach.Comment: SwissText + KONVENS 2020. Data and code are available at
https://github.com/ZurichNLP/xstanc
Contrastive Conditioning for Assessing Disambiguation in MT: A Case Study of Distilled Bias
Lexical disambiguation is a major challenge for machine translation systems, especially if some senses of a word are trained less often than others. Identifying patterns of overgeneralization requires evaluation methods that are both reliable and scalable. We propose contrastive conditioning as a reference-free black-box method for detecting disambiguation errors. Specifically, we score the quality of a translation by conditioning on variants of the source that provide contrastive disambiguation cues. After validating our method, we apply it in a case study to perform a targeted evaluation of sequence-level knowledge distillation. By probing word sense disambiguation and translation of gendered occupation names, we show that distillation-trained models tend to overgeneralize more than other models with a comparable BLEU score. Contrastive conditioning thus highlights a side effect of distillation that is not fully captured by standard evaluation metrics. Code and data to reproduce our findings are publicly available
Iterative, MT-based Sentence Alignment of Parallel Texts
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 175-182.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation
The standard training algorithm in neural machine translation (NMT) suffers
from exposure bias, and alternative algorithms have been proposed to mitigate
this. However, the practical impact of exposure bias is under debate. In this
paper, we link exposure bias to another well-known problem in NMT, namely the
tendency to generate hallucinations under domain shift. In experiments on three
datasets with multiple test domains, we show that exposure bias is partially to
blame for hallucinations, and that training with Minimum Risk Training, which
avoids exposure bias, can mitigate this. Our analysis explains why exposure
bias is more problematic under domain shift, and also links exposure bias to
the beam search problem, i.e. performance deterioration with increasing beam
size. Our results provide a new justification for methods that reduce exposure
bias: even if they do not increase performance on in-domain test sets, they can
increase model robustness to domain shift.Comment: ACL 202
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation
Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search – the de facto standard inference algorithm in NMT – and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift
- …